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Abstract 

Consider a random vector where X is d-dimensional and Y is one- 

dimensional. We assume that Y is subject to random right censoring. The aim of 
this paper is twofold. First we propose a new estimator of the joint distribution of 
{X' ,Y)' . This estimator overcomes the common curse-of-dimensionality problem, 
by using a new dimension reduction technique. Second we assume that the relation 
between X and Y is given by a single index model, and propose a new estimator of 
the parameters in this model. The asymptotic properties of all proposed estimators 
are obtained. 
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1 Introduction and model 



Consider a random vector {X',Y)', where X = {X^^\ . . . ^X^'^'^)' is rf-dimensional and Y 
is one-dimensional. We assume that Y is subject to random right censoring, i.e. instead 
of observing {X', Y)', we observe the triplet (X', T, 6)', where T = Y A C , 6 = I{Y < C), 
and the random variable C is the censoring variable. Typically, Y is (a transformation 
of) the survival time (whose range can span the whole real line), and X is a vector of 
characteristics. The data consist of n i.i.d. replications (X^', Tj, 5^)' of {X',T,6)'. 

Under this setting, the purpose of this paper is twofold. First, we propose a new 
estimator of the joint distribution F{x, y) = F{X < x,Y < y) of X and Y (where X < x 
means that X(^) < x^^^ for j = 1, . . . ,d). Second, we assume that the relation between 
X and Y is given by a single index mean regression model (as in e.g. Hardle, Hall and 
Ichimura, 1993), and we propose new estimators of the parameters under this model. 
These estimators will be constructed under the following fundamental model assumption 
on the relation between Y and C, which we impose throughout this paper : 

(AO) There exists a function : M"' — t- M, such that : 

(i) Y and C are independent, conditionally on g{X) 
(ii) P(F < C|X, Y) = F{Y < C\g{X), Y). 

The function g will be unknown in general. When g is known, this assumption has 
been proposed by Lopez (2007a). The assumption is needed for identifying the model. 
In the literature on nonparametric censored regression, alternatives to assumption (AO) 
have been proposed. There are basically two alternatives, which can be regarded as 
limiting cases of assumption (AO), and in that sense our assumption is a trade-off between 
these two. The first alternative has been used by e.g. Akritas (1994) and Van Keilegom 
and Akritas (1999), among many others. They assume that Y is independent of C, 
conditionally on X, and propose kernel type estimators of the distribution F{x,y) under 
this assumption. This assumption is a particular case of (AO) by taking g{X) = X. 
Their estimators are however restricted to the case where d = 1. Although they could 
in principle be extended to higher dimensions, this is not recommended in practice, since 
they will suffer from the curse-of-dimensionality and higher order kernels will need to 
be used. The second alternative to assumption (AO) has been proposed by Stute (1993, 
1996). He assumes that Y is independent of C, and that P(F < C|X, Y) = F{Y < C\Y). 
This is again a particular case of (AO), by taking g{X) = 1. Although his estimator can 
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be used for any c/ > 1, it has the drawback that it assumes that the censoring variable C 
depends on X in a very particular way. This type of dependence might hold true when 
the censoring is purely 'administrative' (censoring at the end of the study), but when the 
censoring can be caused by other factors (like death due to another disease, change of 
treatment, ...), then less restrictive assumptions on the censoring mechanism are required. 

Our assumption (AO) balances somewhere in between these two extreme assumptions. 
By imposing assumption (AO), we propose a new dimension reduction technique, which 
overcomes the drawbacks of these two classical sets of assumptions, by allowing for c/ > 1 
without assuming the complete independence between Y and C. Note that assumption 
(AO) holds in the particular case where C{C\X,Y) = C{C\g{X)). By assuming that 
the censoring variable depends on X only through a one-dimensional variable g{X), we 
avoid the cur se-of- dimensionality problems which strike regression approaches where X 
is multivariate and Y is independent of C conditionally on X, and at the same time 
the dependence of C on X is not too restrictive. A related dimension reduction model 
assumption for the censoring time has been considered in Section 4 of Li, Wang, and Chen 
(1999). 

In some cases, the function g will be known exactly from some a priori information. 
For example, we might know that the censoring only depends on one component of X, 
for example g{X) = X^^\ Lopez (2007a) proposed an estimator of the joint distribution 
F{x, y) when g is supposed to be known. However, in many other cases, g will be unknown 
and needs to be estimated. Throughout this paper, we will assume that 

^ G ^, where ^ = {x ^ A(^,a;) : ^ G 6}, (LI) 

where A is a known function, and O is a compact parameter set in M'^. The true (but 
unknown) value of 6 will be denoted by ^o- 

Throughout the paper, we will assume that we know some root-n consistent estimator 
^ of 6'o, that satisfies the following : 

(CO) The estimator 6 satisfies : 

1 " 

d-Oo = -y^ f^m, <5„ Xi) + op(n-^/2) , 

i=l 

with E[i^{T,5,X)] = and E[i^{T,6,Xf] < oo. 

Hence, the set O can from now on be taken equal to an arbitrarily small environment 
of ^0. 
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To illustrate the nature of assumptions (AO) and (CO), consider the function g{x) = 
6'f)X, and the case where C follows a Cox regression model given X, in the sense that the 
conditional hazard h{-\x, y) of C given X = x and Y = y satisfies 

h{c\x,y) = /io(c)exp(6'ox) 

for some baseline function Hq only depending on c. Note that this model assumption on 
C is not unrealistic, since often the censoring variable C represents itself a lifetime, like 
the time until a patient dies from a disease other than the disease under study. Under 
this model, we clearly have C{C\X,Y) = C{C\6'qX), and the estimator 6 proposed by 
Andersen and Gill (1982) satisfies condition (CO), with 

^{t,6,x) = S^"*^ I (1 — 5)0(x, t) — / (j){x,u)lt>u[^ — G{u — \x)]~^dG{u\x) 



where the matrix E is defined by condition D in Andersen and Gill (1982), 

with H{t\x) = P(T < t\X = x) and G{c\x) = P(C < c\X = x). See also Gorgens and 
Horowitz (1999) for regression models more general than Cox in which C{C\X,Y) = 
C{C\6'qX). Alternatively, one could also assume that C = r{6'QX) + U, where r(-) is 
given, E{U) = 0, and U is independent of X and Y. For the estimation of 6*0 and the 
verification of condition (CO) under this model, see e.g. Akritas and Van Keilegom (2000) 
and Heuchenne and Van Keilegom (2007). 

The purpose of this paper is twofold. The first contribution of this paper consists in 
proposing and studying a new nonparametric estimator of the joint distribution of X and 
Y under assumption (AO). Under different sets of assumptions on the relation between 
X, Y and C, this distribution has been the object of study of many papers in the past. 
See e.g. Akritas (1994), Stute (1993, 1996), Van Keilegom and Akritas (1999), among 
others. As mentioned before, assumption (AO) allows to avoid the curse-of-dimensionality 
problem present in some of these contributions, and the heavy assumptions on the relation 
between C and X, which are present in many others. 

The second contribution of this paper is the estimation of a semiparametric single 
index regression model for the censored response Y given X under assumption (AO). 
The proposed estimator is based on a two-step procedure, in which first a preliminary 
(consistent) estimator is obtained, which is then used to obtain in a second step a more 
accurate estimator. Both steps heavily rely on the estimator of F{x, y) studied before. 
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Note that in this second contribution two dimension reduction techniques are used : the 
first one comes from assumption (AO), which is concerned with the relation between Y and 
C, and the second one comes from the single index model, which is making an hypothesis 
on the relation between Y and X. 

Single index regression models are now a common semiparametric multivariate ex- 
planatory approach, see for instance Delecroix, Hristache and Patilea (2006) for a review. 
However, the literature on single index models with a censored response variable is rather 
poor. To the best of our knowledge, the only contribution that allows for a general rela- 
tionship between the censoring variable and the covariates is Li, Wang and Chen (1999) 
and it is based on sliced inverse regression (SIR). However, it is well known that the SIR 
approach requires a linear conditional expectation condition among the covariates, which 
may be rather restrictive in some applications, see equation (2.3) in Li, Wang and Chen 
(1999). 

Lopez (2008) proposed a semiparametric least squares estimator for the single index 
regression in the particular case where g{X) = 1 in assumption (AO). A similar pro- 
cedure was introduced by Wang et al. (2007) under the stronger assumption that C is 
independent of (X',y)'. See also Lu and Cheng (2007). Lu et Burke (2005) used the 
same more restrictive condition to define an average derivative estimator of the index. It 
is worthwhile to notice that these three contributions involve a Kaplan-Meier estimate 
of the censoring distribution, while in general assumption (AO) requires a nonparametric 
estimate of the conditional distribution of C given g{X). 

This paper is organized as follows. In the next section the estimators of the joint dis- 
tribution and of the parameters in the single index model are explained in detail. Section 
3 is devoted to the presentation of the asymptotic results of the proposed estimators. Fi- 
nally, Appendix A contains the assumptions under which the results of Section 3 are valid, 
while Appendix B contains some technical lemmas and the proofs of the main results. 

2 The estimators 

2.1 Estimation of the distribution F{x,y) 

We first explain how to estimate the joint distribution F{x, y) of X and Y. For an 
arbitrary value of 9, let 

Ge{t\z)=F{C<t\X{e,X) = z), (2.1) 
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and define 



T,<t 

where 

K 



Here, bandwidth sequence converging to zero as n tends to infinity, and K is a 

probabihty density function (kernel). Note that Gg(t\z) reduces to the estimator proposed 
by Beran (1981) when X{6,X) is equal to X. 

With at hand the estimator 6 introduced in condition (CO), and the corresponding 
estimator g{x) = X{6,x) of g{x), we now define the following estimator of F{x,y) : 

- S r^Srfc- 

Note that this estimator is in the same spirit as the estimator proposed by Stute (1993, 
1996), but the denominators of the two estimators are different, because of the different 
sets of underlying assumptions. See also Fan and Gijbels (1994) for a similar weighting 
scheme in a nonparametric regression framework. Also note that when g would be known, 
this estimator equals the estimator proposed and studied in Lopez (2007b). 

In Section 3.1 we will study the asymptotic properties of the estimator F^{x,y). 

2.2 Estimation of the single index model 

We first need to introduce some notations. For ^ G 6, let Zg = X{6,X), and let Zq G 
be the support of the variable Zg. We assume that is compact for all 6' G 6. Also, 
define He{t\z) = P(T < t\Zg = z) and let th,,z = inf{t : He{t\z) = 1}. 

We assume that the following single index mean regression model is valid : for some 
Pq G B G W^, with, say, first component /3q^^ = 1, 

E[Y \X,Y<r]=E[Y\ f3',X, Y < r] = m{f3',X), (2.4) 

where m is an unknown function, and where r is some fixed truncation point, satisfying 

r < inf inf THe,z- 

e&Bz&Zg 

Let f{t- 13) = E[Y\l3'X = t,Y < t]. Then, /(■; /3o) = m(-). Also, let i3 = {1} x i3, where 
i3 is a compact subset of M'^"^, and denote by X the support of the covariate vector X, 
which is a compact subset of M.'^. 
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The truncation at r in model (12. 4p is very natural and common in the context of 
regression with right censored observations, and is caused by the lack of information in 
the right tail of the conditional distribution of Y given X. See e.g. Akritas (1994) and 
Akritas and Van Keilegom (2000) for similar truncation mechanisms. Note that when 
C{Y\X) = C{Y\P'qX), i.e. when the whole distribution of Y given X only depends on X 
via /3qX, then model (12. 4p is satisfied for any value of r. 

The estimation of /3o consists of several steps. We first explain these steps in an 
informal, intuitive way to outline the main ideas behind the proposed method, and we 
next work out each of these steps in a rigorous way. 

1. Estimate f{t] (3) using some nonparametric estimator /(t; (3). 

2. Construct a preliminary consistent estimator /3„ of (3^. 

3. Use to compute a trimming function. This trimming function avoids technical 
problems caused by denominators close to zero in the nonparametric estimation of 

/(t;/3). 

4. Construct a second semi-parametric estimator /3 of /3o by using the trimming func- 
tion of the preceding step. 



2.2.1 Estimation of /(t; (3) 
One possible estimator of /(t; (3) is 

Ht- /3 = ^ : J; ^- 2.5 

where h = hn is a second bandwidth sequence, possibly different from the bandwidth 
an used to estimate the joint distribution F{x,y), and where K is a kernel function. 
However, other estimators may be used, for example [/^^(rlt)]"""^ J yly<rdFis{y\t) , where 
Fi3{y\t) denotes Beran's (1981) estimator of F(Y < y\(3'X = t). 

In what follows, we do not specify the choice of estimator of f{t; [3). Instead we will 
work with a generic estimator f(t;P) that satisfies certain conditions that need to be 
fulfilled in order to obtain the asymptotic normality of /3, and we will prove in Section 
3.2 that the estimator in (12. 5p satisfies these conditions. 
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2.2.2 Preliminary estimation of /3o 

We assume that we know some set B such that 



inf fU(3'x) = c> 0, 

where the function /J denotes the density of P'X, conditionally on F < r. Define the 
following preliminary trimming function : 

J{x) = l^^B. (2.6) 

Let M{/3J,J) = E[{Y - /3))2ly<^ J(X)], and note that this is minimized as a 

function of /3 when /3 = Pq. Motivated by this fact, we define the preliminary estimator 
of Po by replacing all unknown quantities in M(/3, /, J) by appropriate estimators, i.e. 

/3n = aigmm j {y - f {(3' x; (3) fly<rJ{x)dFg{x,y) 

= argmin M„(/3,/, J). (2.7) 

Note that other criterion functions can be used, based on M or L-estimating functions. 
We do not consider them here, since their analysis is very similar to the one for the least 
squares criterion function. 

2.2.3 New trimming function 

We will now refine the definition of the trimming function, by using the preliminary 
estimator Define 

'^(^) = '^fpj'^»-^ (2-8) 
so instead of requiring that f^{l3'x) > c for all (3, we now only consider /3 = which 
will be satisfied for many more x-values, and hence this new function J{x) is trimming 
much less than the preliminary naive trimming function J{x). 

To simplify our discussion, we will directly consider that the true function jj^ is used 
in the definition of J. In practice, the trimming function can be estimated by 1 
where 

'^^^ n6„P(F < r) ^ 1 - G^(T, - |^(X,)) \ bn J' 
and where 6„ — > is a bandwidth parameter. In applications, Ci = cF(Y < r) can be cho- 
sen arbitrarily small by the statistician. Considering /J^^ or /J^^ does not change anything 
asymptotically speaking, see the arguments in Delecroix, Hristache, Patilea (2006), see 
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also Step in the proof of Theorem 13.51 below. By similar arguments, the estimator of 
f3o obtained with I jr (p/ x)>c asymptotically equivalent to the 'ideal' estimator obtained 
with the trimming function 

J0(X) = l/r^(^^^)>e, (2.9) 

as long as /3„ is a consistent estimator of Pq. Let us point out that Jq only depends on 
PqX and, in view of equation (1A.14|) in the proof of Theorem 13. 5[ this property will be 
essential for achieving -^/ri— asymptotic normality of our estimator /3 defined below. 

2.2.4 Estimation of /3o 

With at hand this new trimming function, we can now define a new semi-parametric least 
squares estimator of (3o : 

/3 = argmin [{y- f{(3'x;(3)fly<rJ{x)dFg{x,y) (2.10) 
= argminM„(/3,/, J), 

where i3„ is a set shrinking to {/3o}, which is computed from the preliminary step. The 
proof of the asymptotic normality of /3 will be carried out in two steps. We will first show 
that minimizing M„(/3, /, J) is asymptotically equivalent to minimizing Mn{P, /, Jo)- This 
then brings back the minimization problem to a fully parametric one. 



3 Asymptotic properties 

3.1 Estimation of the distribution F{x,y) 

Let us first introduce a few notations. Denote H{t) = P(T < t), Hg{t\z) = P(T < t\Zg = 
z), Hefl{t\z) = F{T < t,5 = 0\Ze = z), and Hg^i{t\z) = F{T < t,5 = l\Zg = z). For 
any function L(u), let V„L(u) (respectively V^„L(n)) denote the vector (respectively 
matrix) of partial derivatives of order 1 (respectively order 2) of L[u) with respect to u. 
In particular, denote by V6iG'e(t|A(^, x)) the vector of partial derivatives of the function 
GgitykiQ ^xy) with respect to all occurrences of Q. Let us point out that, in general, the 
vector valued function V0G'0(t|A(^, x)) depends on x, and not only on A(6',x). Finally, for 
any matrix A of dimensions /c x £ (where k,i > 1) we denote \A\ = [trace(yl'74)]-^/^. 
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We further need to introduce two (intermediate) estimators of F[x,y) : 

n 1 - GffJTi - QiXi)) 



GeM-\9my 

F,ix,y) = -y fllZl^Miif . (3.2) 



i=l 
n 



i=l 



In the following result we consider integrals of the form J (x, y) dFg {x, y) with 
belonging to some class of functions J-", and we state that this class of integrals is Glivenko- 
Cantelli and admits an i.i.d. representation uniformly over all G J-". The proof can be 
found in Lopez (2007b). For a completely nonparametric estimator of F{x,y) that is not 
based on model assumption (AO), Sanchez-Sellero, Gonzalez-Manteiga and Van Keilegom 
(2005) obtained a similar uniform consistency and convergence result. The assumptions 
mentioned below can be found in Appendix A. 

Theorem 3.1 i) Under AssumptionsUl andl^ for a„ — ?■ and nan oo, and for a class 
T satisfying Condition^ we have 



sup 



<P{x,y)d[Fg-F] {x,y) 



^a.s. 0. 



ii) For Zi = A(6'o,Xj), define 

lTi>ydGeo{y\Zi) 



M,{t) = {l-5i)l 



which is a continuous time martingale with respect to the natural filtration 
a{{ZilTi<t,TilTi<t, Si^TiKtyi = l,---,''^})- Under Assumptions \J\^ and for a class T sat- 
isfying Conditions\^^ 

^{Z,,s)dMi{s) 



I <l){x,y)d[Fg~Fg]{x,y) = f 

^ i=i 



[l-F{s-\Zm-GeAs\Z,)] 



Rn (0) 



where sup^gjr |-R„(0) | = op{n ^/^), is defined above Condition\^ and F[s\z) = F{Y < 
s\Zeo = z). 

The following Theorem furnishes the behavior of the difference between integrals with 
respect to F^ and integrals with respect to Fg. 
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Theorem 3.2 i) Under Assumptions^ andO for a„ — ?■ and nan oo, and for a 
class T satisfying ConditionU\ we have 



sup 



{x,y)d[F^-F,]{x,y) 



Op(l). 



a) Under AssumptionsU\\^ andl^ for a„ — )■ and na^llogn) ^ — )■ oo, and for a class 
T whose envelope is as in ConditionUl 

>{x,y)d[F^ - Fg]{x,y) 

^ f <l)iX,Y){VeGe,AY - \Xie,,X))y \ 1 v ^ ^ 5 

= i l-G,AY-\a(X)) n 2^ + 



i=l 



n 



-1/21 



where the function fi is defined in (CO), and where sup^gj- |-R„(0)| = Op{ 
3.2 Estimation of the single index model 

We now return to the single index model (12. 4p and to the estimators and /3 defined in 
( 12. 7p and (I2.10p . We start with stating the asymptotic consistency of the estimator 
Note that the estimator j3 is by construction consistent, since it is defined on a shrinking 
neighborhood of /3o. 



Theorem 3.3 Let J be defined as in /[2.6\) . Under Assumptions{l\\^\^\2l andl^ [Xj\) . 

and for a„ — )■ and nan oo, we have 

sup|M„(/3,/,J)-M(/3,/,J)| ^0, 

in probability. Consequently, (5n (3o in probability. 

The next lemma is an important property in the literature on single index models. In 
the classical uncensored single index regression model, the property E[V i3f{PQX; I3q)\I3qX] 
= plays a major role in proving the asymptotic normality of M-estimators. See Dele- 
croix, Hristache and Patilea (2006). The next lemma shows that in our context, where we 
have to truncate at r because of censoring in the data, the analogous truncated version 
of this property holds true without any further model conditions. 

Lemma 3.4 Assume that the derivative V /^fiP'o-', Po) exists and is bounded. Then, for 
any Pq satisfying condition Iji2.4\), 



E[V(sfW'oX;PonY<r\l3'oX] = 0. 
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This lemma is crucial for obtaining our i.i.d. representation and the asymptotic nor- 
mality of /3, which we state in the next Theorem. We denote by V^/(/3o"; Po) the vector 
of partial derivatives with respect to the last d — 1 components of (3. 

Theorem 3.5 Let(l){x,y) = (y — /(/3o^; /3o))V^/(/3oa;; /3o)ly<T^o(3^)- Under Assumptions 
[71 477), we have 



{x,y)d{Fg {x,y) - F{x,y)) 



(3.3) 



^ig{Xi),s)dM,{s) 



[I- F{s - \g{Xm " GeM9{X,))] 

_^ /0(x,y){v.G. (r |A(.o X))rN 1 ^ 



+ op[n 



1 

n ^-^ 

1=1 



+ Op{n-''^) 



where the function fi is defined in (CO), and where 

n = E[ly<,Jo(X)V^/(/3^,X;/3o)V^/(/3^,X;/3o)'] 

Hence, 



'lo-l 



If we wish to estimate the asymptotic variance in (13.51) . we see that we need to estimate 
the variance of f2~^?7. However, one can estimate consistently by 

1 " 

^ = - E ly<.^(^.) V/3/(/3%; ^) V^/(/3%; ^S)'. 



Similarly, when it comes to estimate the covariance matrix of 77, one can proceed by 
taking the empirical variance of a random vector (r/(Tj, Xj))i<j<„, where f] denotes 
an estimated version of 77 in which we replaced each unknown quantity by its empirical 
counterpart (/ replaced by /, /3o by /3, F by F...). 

We end this Section with the verification of Assumptions [UHTT] for the estimator /(t; /3) 
defined in f|2.5p . Define the (uncomputable) kernel estimator based on F^, 

lk{^)y\y<AF,{x,y) 



lKi^)ly<rdF,{x,y) 



(3.4) 
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The advantage of Fg, and hence of /*, is that it is composed of sums of i.i.d. terms. Clas- 
sical arguments show that /* satisfies Assumptions 191 to ITTl This is shown in Proposition 
13.61 below. On the other hand, Proposition 13.71 shows that the difference between / and 
/* is sufficiently small so that / also satisfies these assumptions. 

Proposition 3.6 Assume that 

(i) K is a symmetric density function with compact support, and with two continuous 
derivatives of hounded variation; 

(ii) /(■; /3o) e and V^filS'^-; /3o) G ^? and Ul defined m ( fX^y and HTR) : 
(Hi) nh^ (log n)~^/'^ — t- 00, and nh^ — )■ 0, 

Then, f* satisfies Assumptions FPI-ITTl 

Proposition 3.7 Under the assumptions of Theorem VJ.i^ we have 
sup |r(/3'x;/3) - = Op{{\ognfl^n~^/^a~^l^), 

sup \Vpf*iP'x;P) - \/JiP'x;P)\ = Opii\ogny/'h-'n-'/'a~'/'), 

sup \Vlpf\P'x-p) - Vlpf{P'x-p)\ = Op{{\ognf''h-^n-'''a-''^^ 

where f is the estimator defined in ^MIW- Moreover, V pf{PQX\ Pq) = xmi{(]Qx) + m2{(3Qx), 
with, for j = 1,2, 

sup Im^iP'ox) - m*{P'ox)\ = Op{{\ogny/^h~^n~^/^a~^/^), 
sup miu) - m*'{u)\ = Op{{\ogny/^h-^n-^/^a-^/^), 

where the functions m* are defined in liA.15\) . and where m' denotes the derivative of the 
univariate function P'qX 3 u ^ m{u). 

Notice that f'{u;/3o) = mi{u) (resp. /*'(u;/3o) = ml{u)). Combining Propositions 13.61 
and 13. 71 shows that / satisfies Assumptions [DHTTlprovided that nh^ — 0, na„/;.^ (log n)"^ — t- 00 
and han^^^{lognY^'^ — 5> 0. In the case where a„ = for some 6 g]0, 1[, these con- 

ditions are satisfied if nh'^^ (log n)~^ — )■ 00 and nh^"'^^ (log n)'^~^ — t- 0. 

4 Simulation study 

To investigate small sample behaviour of our procedure, we considered to different models 
to perform a simulation study. In the first model, we consider the regression function 

mMx) = (3'QX-0.5{(3'oxf, 
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Regression model 


15 

/3 


Proj 

% 

KM 


portion 
30 

/3 


of cense 
% 
KM 


ring 

50 

/3 


% 
KM 


nil 


1.022 


1.463 


1.147 


1.279 


1.619 


1.728 


ni2 


0.580 


1.480 


1.290 


1.613 


1.407 


1.633 



Table 1: Comparison of the MSE of the proposed estimator (columns /3) with the one 
based on Kaplan-Meier weights (columns KM) for different proportions of censoring. 

and in the second 

m2(/3^x) =log(l + 0.5/3^x), 

with /3o = (1,0.75,0.25,-0.5). We consider residuals Si = Y, - niji/S'^Xi) (for j = 1,2) 
that are Gaussian variables A/'(0, 1) independent from Xj. The covariates are composed 
of 4 independent components, following an uniform distribution on [0, 1]. 

Concerning the censoring variable, we take Ci\Xi ~ S^XexpO'^Xi), where 
^0 = (0.1,0.2,-0.1,0.3), and A is a parameter that allows us to modify the average 
proportion of censored responses. The parameter 6q is estimated through maximizing 
Cox pseudo-likelihood, since the regression model on Cj is a proportional hazards model. 

We consider 10000 replications of this simulation scheme for n = 200. For each simu- 
lated sample j, we compute the resulting estimator ^^^^ of /3o and compute 11/3'-"'^- /^olli- We 
then deduce an estimator of the mean square error (MSE) E'fH/? — /3o||2]- We took a„ = 2 
for the bandwidth involved in Beran's estimator. Since the procedure is more sensitive to 
the choice of the second bandwidth h, we consider a set of bandwidths hj = 0.5 -|- jO.l, 
for j = 1, 10, and for each sample, we take the bandwidth that gives the lowest value 
of Mn. In Table [H we compare the MSE of the estimator that we propose to the MSE 
of an estimator based on Kaplan-Meier weights, that is if we replace Beran's estimator 
in our approach by a standard Kaplan-Meier estimator. This alternative estimator is the 
one defined in Lopez (2009). As for our approach, this estimator puts more weights to the 
largest uncensored observations caused by censoring. Nevertheless this alternative proce- 
dure is not adapted to Assumption [AO) that we use in the present framework. Therefore, 
the estimator of Lopez (2009) is expected to fail in this simulation setting. 

As expected, our estimator based on conditional Kaplan-Meier weighting outperforms 
the estimator of Lopez (2009) in the different situations we consider. It is also natural 
to observe that the MSE of our estimator f3 decreases when the proportion of censoring 
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increases. 



Appendix A : Assumptions and conditions 

We split the assumptions in three parts, namely those required for the estimation of 
F{x,y), the estimation of Pq, and the estimation of /(■; 

Assumptions needed for the estimation of F{x,y). The asymptotic results related 
to the estimator F^{x,y) will be valid under the following assumptions and conditions. 

Assumption 1 The distribution ¥{Ze < z) has three uniformly bounded derivatives for 
z ^ Zq and 6' G O, and the densities fzg {z) satisfy infgge inf^g^^ fzg {z) > 0. 

For any function J{t \ z) we will denote by Jc{t \ z) the continuous part, and Jd(t \ 
z) = J(t I z) — Jc{t I z). Assumption [2] below has been introduced by Du and Akritas 
(2002) to obtain their asymptotic i.i.d. representation of the conditional Kaplan-Meier 
estimator. 

Assumption 2 (i) Let L{y\z) denote Hg^-^{y\z) or Hg^^fi{y\z). Then, VzL{y\z) and 
^L{y\z) exist, are continuous with respect to z, and are uniformly bounded as 
functions of {z, y). 

(a) For some positive nondecreasing bounded (on [—oo;t]) functions Li, L2, L^, we 
have, for all z G Zg^^ , 

\He,,{ti \ z) - He,,{t2 \ z)\ < |Li (ti) - Li (ta)! , 
\V,HeUti\z)-V,Heocit2\z)\ < IL2 (ti) - L2 (ta)! , 
\yzHe,,oc{ti I z) - V,He,,Ut2 \ z)\ < IL3 (ti) - L3 (^2)! , 
the last two assumptions implying the same kind for VzHic- 

(Hi) The jumps of Fg{- \ z) andGe^i^- \ z) are the same for all z G Ze^^. Let (di, ^2, ...) be 
the atoms ofG. 

(iv) Fg{- I z) and Gg^^- \ z) have two derivatives with respect to z, with the first deriva- 
tives uniformly bounded (on [— oo;r]j. The variation of the functions VzFg{- \ z) 
and Vl ^Fg{- \ z) on [—00; r] is bounded by a constant not depending on z. 
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(v) For all di, define 

Si = sup \Fg{di- I z) - Fg{di \z)\, 

s- = sup \VzFg{di- I z) - VzFg{di \ z) \ , 

Ti = sup \G0o{di- I z) - Geaidi \ z) \ , 

r[ = sup \VzGeo{di- \ z) - V zGe^idi \ z) \ . 

Then, Y^d^^A^^ + + '^'i + ^'i) < ^■ 

Assumption 3 The kernel K is a symmetric probability density function with compact 
support, and K has bounded second derivative. 

Assumption 4 The bandwidth an satisfies (logn)n^^a~'^ and na^ — > 0. 

Assumption 5 The function {x,t,6) i— Gg(t\X{6,x)) is differentiable with respect to 9, 
and the vector 'VgGg{t\X{6,x)) is uniformly bounded in {x,t,6). 

The class of functions J-" considered in Section 3.1 should satisfy the following con- 
ditions, which are taken over from Lopez (2007b). The conditions make use of concepts 
from the context of empirical processes, which can be found e.g. in Van der Vaart and 
Wellner (1996). 

Condition 1 Let po{x,y,c) = ly<c[l — Ge^iy — \9{.x))Y^- The class p^F is IP(x,r,c)- 
Glivenko-Cantelli, and has an integrable envelope $o satisfying ^o{x,y,c) = fory > r. 

Condition 2 The covering number A^(e, J-", L^(P(x.y))) is bounded by Ae~^ for £ > 
and for some A, V" > 0, and T has a square integrable envelope $ satisfying ^{x, y) = 
for y > T. 

Let Z = Zq^ = g{X), let Fz{x,y) = F{X < x,Y < y\Z = z), and for any function 
(j){x,y), define 4>{z,s) = J ls<y4>{x,y)dFz{x,y). Let Ze^^n be the set of all points at a 
distance at least 77 > from the complementary of Zg^. 

Condition 3 For all (p & J^, (p is twice differentiable with respect to z, and 

sup {|V,0(z, s)\ + s)|} < M < 00, 

for some constant M not depending on 0. Moreover, $ is bounded on ^g^ ^^x] — 00; r], 
and has bounded partial derivatives with respect to z, where $ is the envelope function of 
Condition 
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The reason for introducing the set ^eo,?? is to prevent us from boundary effects coming 
from kernel estimators. See Lopez (2007b) for a detailed discussion on this issue. 



Assumptions needed for the estimation of /3o. We next state the additional as- 
sumptions needed for the asymptotic results concerning the estimation of the parameters 
in the single index model. 

Assumption 6 There exist < Cq < Ci < oo and t] > such that, for each c G [cq, Ci] 
and X & X , 

Moreover, assume that 

\rMx)-rpM^)\<c\\p,-p,r, 

for some positive constant C and some a > 0. 

Assumption 7 (i) E{\Y\^) < oo; 

(it) E [{/(/3'X; (3) - f{f3',X- f3o)riY<r] = 0^/3 = f3o; 

(Hi) (3o = (l,/3g)' with Pq an interior point of B; 

(iv) The class {{x,y) — )■ f{P'x] P)ly<r : P E B} satisfies Condition IJ\ for a continuous 
integrable envelope 

Assumption 8 The classes {x V i3f{f3'x] f3) : f3 e B} and {x V'^p^pf{(3'x] (3) : (3 G 
B} are VC-classes of continuous functions for a uniformly hounded envelope. 

Assumptions needed for the estimation of f{-; (3). The last group of assumptions is 
required for the generic estimator /(■; (3). They are verified in Section 3.2 for the estimator 
defined in (12. 5p . 

Assumption 9 For all c> 0, 

sup |/(/3'a;;/3)-/(/3'x;/3)|%(;3'.)>c = op(l), (A.l) 

sup \Vj{f3'x-f3) - V^/(/3'a:;/3)|l/.(^,,)>e = op(l), (A.2) 

sup \VlJ{P'x;P) - V^,^/(/3'x;/3)|l;.(^,.)>, = op(l). (A.3) 

Assumption 10 There exist Donsker classes Tii and such that f{-;f3o) G "Hi and 
^ i3f{f3Q-', f3o) G 712, and such that with probability tending to one, f{-;Po) G "Hi and 
vJ{f3'o-,f3o)e'H2. 
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Typical examples of such kind of Donsker classes are classes of regular functions. Let 
r = {P'qX : X G A"} C M and let CliT.M) = {h:T^M.^ : supt^rll^WI + l^'(^)l} < 
for £ > 1 and for some M < oo. Define 

n1 = cl{r,M), (A.4) 

n'^^ = {h: X -.x^ xhiP'ox) + h2{^ox) : h G Cj(r, M), /is G C^(r, M)}.(A.5) 

The class "H^ ^ Donsker class, which follows from stability properties of Donsker classes 
(see e.g. Examples 2.10.7 and 2.10.10 in Van der Vaart and Wellner (1996)). 

Assumption 11 For all c> 0, 

sup \fil3oX; /3o) - fWoX; /3o)|l/- (/3,»>c = Op{en), 

sup \Vpf{(3'f^x] /3o) - Vpf{(3'QX] /3o)|l/- {p'^x)>c = Op{e'J, 

where En and e'^ satisfy Ens'^ = o(n~^/^), an"'^''^(logn)-'^/^e„ — )• and an^^'^ (log nY^'^e'^ — )■ 0. 

Appendix B : Technical lemmas and proofs 

We start this Appendix with two technical lemmas, needed in the proofs of the main 
results. The first technical lemma gives a concentration inequality for the convergence 
rate of semi-parametric estimators. 

Let bn be a sequence of real numbers tending to zero, and let {(a '■ ol G A\ be a 
family of uniformly bounded functions, where ^ is a compact subset of W (with p > 1). 
Consider the class of functions 



Q = !^{u,z,t,6)^g^,,,,{u,z,t,6) = (^ ^^^'^^^ ^K^) ^ ^(x, ^, ^, t, ^)e(t)l, 

a G Aa; G A',^; G m| , (A.6) 



where K^, ip and ^ are fixed functions, A" C M'^ is a compact set, and t G M, and consider 
the process (in a, x and v) 

n 

^n{ga,x,v) = ^ {ga,x,v{Xii Ti, 6i) — E[ga^rc^y{X, Z, T, 6)]) . 
i=l 

Typically, K'^ denotes either a kernel or its derivative of order 1 or 2. 
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Lemma A.l Assume that the class of functions 

[u, z, t, 5) ^ ^^^^^ u,z,t,S):aeA,xeX)- (A.7) 



is a VC-class of functions for a constant envelope, assume that E[\C,{T)\^] < oo, and that 
nbf^/ (log n) —7- oo. Then, 



n-'/X'/'[\og{lX)]-'\\u, 



n\\G 



Op(l] 



where 



denotes the uniform norm over all maps in Q . 



Remark. Note that if is of bounded variation with compact support, and if x) 
a'x, then (1A.7P holds, see Nolan and Pollard (1987). 



Proof of Lemma lA.ll As the class Q is not necessarily uniformly bounded, intro- 
duce a truncation bound M„, and consider the class Qn of functions gjy^l^v{x, z,t,6) = 
^, ^)l|C(t)|<Af„- We set Mn = Apply Proposition 1 in Einmahl and Ma- 

son (2005) to the class of functions Qn- Their condition 1 holds, taking the envelope 
G{u,z,t,6) = M„||ir°||oo, and /3 = Cibl/'^ , for some Ci > 0. Condition 2 holds as the class 
Qn is VC. Indeed, the class t — 1(<^ indexed by f G M is VC (see Example 19.6 in Van 
der Vaart, 1998), and hence (1A.7P and Lemma 2.14 (ii) in Pakes and Pollard (1989) yield 
that Qn satisfies their condition 2, whereas condition 3 holds for ao = cr = C2bn, for some 
C2 > 0. For condition 4, we have 



sup \\g\\oo < M„||A'°||oo < c-sVna^, 
g&Qn 

for some constant C3 > 0. 

Applying Proposition 1 in Einmahl and Mason (2005), we can deduce that, for any 
M < 00 sufficiently large, for all m' > and for some > 0, 



nWQr 



>aAe 



sup 



i=l 



U 



where Wi = (A-, Zi, Ti, di)' and ei, ■ ■ ■ , e„, is a sequence of independent Rademacher ran- 
dom variables, independent of Wi, ■ ■ ■ ,Wn- Now we can apply Talagrand's inequality 



(1994), see also Einmahl and Mason (2005). Taking u' = \Jnbn log deduce that 

hnh,.= Op {{nhny'\\0gh-n'Y")- 
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Now, for some C4 > 0, 

n 

\Wn\\g < \Wn\\g,, + CA^\^iTi)\l\^(^T,)\>M„. 
i=l 

The second term on the right hand side is of the order Op{b~^) = Op{{\ognY^'^n^^'^bl/'^). 
Indeed, its expectation can be bounded, using Holder's inequahty, by 
nE[\^{T)\^Y/^F{\^{T) \ > M„)2/3. Using Tchebyshev's inequahty. 



P(|e(T)|>M„)<E 



M3 



□ 



The second technical lemma shows the consistency of the estimator Ge{t\X{6,x)) and 
its vector of partial derivatives, uniformly in t, 9 and x, and it also establishes the rate of 
convergence of the estimator GQ{t\g[x)), uniformly in t and x. 



Lemma A. 2 Under the assumptions of Theorem \3.S\ we have 

sup \Geit\Xie,x))-Geit\X{e,x))\=opil), (A.8) 

t<T,ee0,x£A: 

sup \VeG9{t\X{e,x))~VeGe{t\X{e,x))\=op{l), (A.9) 

t<T,eee,x£X 

sup sup \Gs{t\g{x)) - Ge,{t\9{x))\ = Op{n-'/^a-'/^{\ogny/^). (A.IO) 

Proof. For the first part, with probability tending to 1, for t < t, 1 — G{t\X{6,x)) > 0. 
Taking the logarithm, one obtains 

n 

log(l-G(t|A(^,x))) = ^(l-5,)lT,<tlog(l-lV„,(x,^)), 

2=1 

where 

^ / x(e,Xi)-\i9,x) \ 
Wn 9) = WJXi, Ti- X, 9) = "-^ ^. 

l^j=l i-T,>Tj< [ ) 

A Taylor expansion leads to 



log(l-G(t|A(^,x))) = -^(l-5,)W„,,(x,^)lT,<t + 0p(n-ia;^ 



[I - Ui)vvn,i{^,c)±Ti<t -r yypyn a^"^^ 

i=l 

where the order of the remainder term is uniform in t, 9, x, as 



sup sup 6*)! = Op{n a„ ). 

i:T,<T x,e 
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The remainder term is op(l) if na^ — t- oo. Rewrite 

■ -, nCln . , \ (^n J 

2=1 i = l ^ ' 

nan ^ " V ctn / 



X 



5e(A(g,x),T,)-ge(A(g,x),T,: 
5',(A(^,x),T,)S',(A(^,x),T,) 



where 



5,(A(^,x),y) = [l-iJ,(y|A(^,x))]/z,(A(^,x)), 
A.w, . . 1 /A(^^^-A(M)\ 

Apply Lemma [A. II to obtain the uniform convergence of Sq towards Sq^ and to show that 

nan ^ " V «n / 

Op 1 . 



sup 



c/i7,,o(s|A(^,x)) 



,l-iJ,(s-|A(^,x)) 

Since 5*^ is uniformly bounded away from zero for ?/ < r, see Assumption [H the result 



follows from 



exp 



\-Ee{s-\\{Q,x))^ 



\-Ge{t\\{e,x)). 



For the gradient, we have 

n 



i=l 



From this, we deduce that the convergence of VeGg follows from the convergence of Gg, 
of Sg and of 

na2^ - V «n 



and 



1 / 
J2l^^^,Vg\{9,x)K' ( 



, /A(^,X,)-A(^,a;) 
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These two quantities can be studied using Lemma \A.1\ which shows that their centered 
versions converge uniformly with rate (na^)~^/^ logra, while the bias term is of order a^. 

The third result can be deduced from a Taylor expansion, Assumption [5] and Propo- 
sition 4.3 in Van Keilegom and Akritas (1999). Indeed, we can deduce that 

sup sup \Gg{t\g{x)) - Gg^{t\g{x))\ 

< sup sup \Ge,{t\9{x)) - Ge,{t\9{x))\ + Op{\\e - OoW). 

□ 

We are now ready to give the proofs of the main results. 

Proof of Theorem 13.21 Part i) of the Theorem can be easily derived by replacing the 
differentiability condition in Assumption [5] by a uniform continuity condition on Ge with 
respect to 6, and equation flA.8|) in Lemma [A. 21 
For part ii), a Taylor expansion with respect to 6 leads to 



I 0(a;,yMF,-F,](x,y) = -i|;^ 



[l-GeAT^-\m,X,W 

for some ^„ between 9 and ^o- From the convergence of 9 towards 9q, it follows that 9n 
tends to ^o- Moreover, applying equation (1A.8|) and (1A.9P in Lemma [A. 2 1 we obtain that 



r 1 

/ (l)ix,y)d[F^-Fg]ix,y) = -'Y. 



Si(P{X„T,)VeGe,m - |A(^o,X,))(^ - ^o) 



[1 - Ge,m - \9{X,W 



= f/„(0) + /2„(0), 
with sup^ |-Rn(0)| < |-Rn.('^')| = 0p{n~^^'^), and 



1 ^ 6,<l)iX,,T,)\/eGe,m-\\i9o,X,)) \ f 1 ^ , , y A ^ R' 



with sup^ |-R^(0)| < = op{n~^^'^). Centering the first sum in Un{4>) and applying 

a uniform Central Limit Theorem (see e.g. Van der Vaart and Wellner, 1996), we obtain 
the stated representation. □ 
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Proof of Theorem 13. 3i Consider the difference 
|M„(/3,/,J)-M„(/3,/,J)| 

<2 [ \y\ly^,dF^,{x,y) sup \f{/3'x- (3) - f{/3'x- (3)\ 

J x:J{x)=l,l3el3 



+ / ly<r\f{l3'x-/3) + f{/3'x-p)\dFg{x,y) sup /3) - 

The first term on the right hand side converges uniformly to zero by Assumption M and 
the law of large numbers for Fg (see Theorem 13.11 and Theorem 13. 2p . The integral in the 
second term can be bounded by 



:i + op(l))x J 2^{x)dFg{x,y), 



where Op(l) is uniform in /3, by Assumption [71 and [U]- (lA.ip . Now we have to show that 
M„(/3, /, J*) converges to M(/3, /, J*) uniformly in /3. For this, apply Theorem 13.11 and 
Theorem 13.21 using Assumption [71 By usual arguments for proving consistency (see e.g. 
Van der Vaart, 1998, Theorem 5.7), the consistency of /3„ follows. □ 

Proof of Lemma 13.41 The proof is somewhat similar to the proof of Lemma 5A in 
Dominitz and Sherman (2005). First observe that 

f{/3'X-/3) = E[Y\(3'X,Y<t] 

= E[/(/3^X;/3o)|/3%r <r] 

E[f{f3',X;f3o)lY<r\f3'X] 
¥{Y < t\i3'X) 

Let a(X, 13) = - l3'X. Define 

rx(/3i, f32) = E [/(«(X, /3i) + P'^X- /3o)ly<.|/3^X] , 

and note that f{(3'X;(3) = TxiP, /3)/F{Y < t\I3'X). Then, 

V/3,rx(/3o,/3o) = -f\f3',X;f3o)E[XF{Y<T\X)\f3',X], 
V^,r^(/3o,/3o) = nf3'oX;(3o)XF{Y<T\(3',X) + f{/3'oX-/3o)Vph{X,(3o), 

where h{x, /3) = F{Y < t\(3'X = (3'x). It follows that 

f'{f3'oX- f3o) {xF (Y<T\f3',X = P'.x) - E [X¥{Y < r\X) \ f3',X = f3',x]} 



V/3/(/3^x;/3o) 



P(F < t\(3',X = P',x) 
Vph{x,f3,)f%x-f3o) Vph{x,P^)f%x-l3,)E[lY<MX = fi'^x] 



P(F < t\(5'^X = (3',x) P(y < t\(3'^X = (3'^xY 

xm^x) + m2{Pox). (A. 11) 
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Therefore, 



E[V^f{P',X;PonY<rWoX] 

E [f\l3',X- /3o) {XP (y < r I - E [XP(y < r\X) \ P'.X]} ly<. | P'.X] 



P(r < r|/3^X) 



0. 



□ 



Proof of Theorem 13.51 The proof consists of three steps : 

Step : Replace J by Jq. For any Bn a sequence of shrinking neighborhoods of /So, 



sup 

/3eB„ 



M„(/3, /, J) - M„(/3, /, Jo) < op(M„(/3, /, Jo) + n-'). 



See Delecroix, Hristache and Patilea (2006), page 738. Similar arguments apply also when 
the trimming J is defined with /^^(/3^x) justifying the practical implementation of the 
trimming function. 

Step 1 : Bring the problem back to the parametric case. 

For notational simplicity, we work with V/j/ instead of V^/. Note that V/j/ = 
(0, V^/)'. We will show that, on Bn, 



M^{l3J,Jo)=Mn{l3J,Jo) + op 
where C'^ does not depend on /3. Decompose 



11/3 -/3o| 



Op 



Mn[/3j,Jo) = M„ (/?,/, Jo) 



5,Jo(X,)(T,-/(/3'X,;/3))1t,<. 



n 



n ^ 1 



(T,-|^(X,)) 
^j-^o(Xi)lr,<T 



/(/3%;/3)-/(/3%;/3) 



Mj/3,/,Jo)-2A„ + 5i„. 



/(/3%;/3)-/(/3%;/3) 



Step 1.1 : Study of A^^- 
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Ain can be expressed as 



Air. 



n 



1=1 



G^{T,-\g{X,)) 



/(/3^X,;/3o)-/(/3^X,;/3o) 



_^ 1 SjMX,)1t,<, if iP'.Xf, /3o) - / /?)) 



X 



1 - (T, - 

/(/3^X,;/3o)-/(/3^X,;/3o) 



_^ 1 ^,Jo(X ,)lr.<. (/ /3o) - / (/?%; /?)) 



n 



1 - (T, - |^(X,)) 



1 6,MX,) {Ti - / /3o)) lT.<r 



i_G,-(T,-|^(X,)) 

does not depend on /3. For Aa^, observe that, for any /3 G we can replace 
Jo(Xj) by l/^(/3'Xi)>c/2 using Assumption O As V pf{P'x] /3) is a bounded function of a; 
and P (Assumption [8], since the class of functions has a bounded envelope), and using 
the uniform convergence of V/3/(/3'x; f3) (Assumption [9]) , we can obtain from a first order 
Taylor expansion applied twice (for f{/3'x;(3) and for f{(3'x](3) — f{(3'x;(3) around (3o), 
that A3„ = op( 11/3- /3oin. 

For A4n, first replace Gg with Ggg. For this, note that [1 — Gg^lTi — \g{Xi))] is bounded 
away from zero with probability tending to 1 for Tj < r, and that 



sup 

t<T,x: Jo{x) = l 



G^m^))-Geom^)) /(fc/3o)-/(/3o^;/3o) = op{n~'/') (A.12) 



using part 2 of Assumption [TTl and Lemma IA.2I A first order Taylor expansion for 
fiP'x; P) - f{P'QX; Po) and property (lA.12j) lead to 



A 



An 



1 6MX^nT,<r if {P'oXf, Po) ' f (/?%; P)) 



i=l 



-Op 



1 - Ge, (T, - \g{Xi)) 



f{P',x,-Po)-f{p',x,-p,) 



11/3 -/3o| 



Next, a second order Taylor development shows that the first term above can be rewritten 
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as 



^ l-Ge~{T,-\g{X;)) 



i=l 



+op{\\(3-(3o 



(A.13) 



To show that this term is neghgible, we will use empirical process theory. We have that 
/ G "Hi, where "Hi is the Donsker class defined in Assumption [TOl and / G "Hi with 
probability tending to 1. Consequently, the class of functions 



"Hi = < (y, c, X, t) 



G Hi 



I - Ge^{y ^c - \g{x)) 

is a Donsker class, see Example 2.10.8 in Van der Vaart and Wellner (1996). Furthermore, 
for all G Hi, 



E 



l-GeAT-\g{X)) 



E[V^/(/3^X;/3o)0(/3^X)Jo(X)l 



= 0, 

(A.14) 

since pf{(5'QX^ I3q)1y<t\P'qX] = (see Lemma [3.4p . and since Jq^X) is a function of 
PqX alone. Deduce that, since Ti'i is a Donsker class, and since / tends uniformly to /, that 
the first term in (1A.13P is of order op{\\P — /3q \\n"^/^). See the asymptotic equicontinuity 
of Donsker classes, cf. Van der Vaart and Wellner (1996), Section 2.1.2. 

For A^n, apply a second order Taylor expansion. Using that V^^/ is bounded, and 
that converges uniformly to we obtain 



, _ (/3 - ^ 6,MX,)lTMT^- f{f3'oXf, /3o))[V^/(/3^X,; f3o) - V^f{f3',Xf, f3o)] 

^5n — 



n 



i=l 



1 - G^{T, - \g{X,)) 



+opi\\/3 - (3o\ 



Proceed as for A^n to replace G and ^ by G and g, using part 3 of Assumption [TT] The 
same arguments as for A^n can then be used, but considering instead the Donsker class 



{y,c,x) 



'-y<c 



Jo{x)ly<riy - fiP'^x; (3o))(j){x) 



G H 



2 t ' 



1 - Geoiy - \g{x)) 

where is defined in Assumption [TOl and observing that, for any function 

"5Jo(X)0(X) {Y - f%X- /3o)) lT<r 



E 



l-GeAT-\g{X)) 
E [E [{Y - f{(3',X- /3o))ly<. | X] Jo(X)0(X)] = 0, 
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by the definition of our regression model. Deduce tliat = op(||/3— /3o||n /3o|P). 
Step 1.2 : Study of Bin. 

Rewrite Bin as 



Bin 



n ^ 1 



^i<>^o(^i)lT,<r 



G^(r,-|^(x,)) 



--E- ^ 

'^j-^o(^i)lTi<T 



+ -> 

n ^ 1 - 



/(/3^X,;/3o)-/(/3^X,;/3o) 
/(/3^X,;/3o)-/(/3;X,;/3o) 



Observe tliat, for any /3 G we can replace Jo(Xj) by l/-^(/3'Xi)>c/2 using Assumption 
[HI Next, by a Taylor expansion and the uniform convergence of V/3/, we have that 
-B2n. = op(||/3 — /3o|P)- The term i?3„ does not depend on /3. For i?4„, a second order 
Taylor expansion leads to 



B. 



4n 



1=1 



1 - (T, - \g{X,)) 



X [V;3/(/3^X,; /3o) - V^/(/3^X,; /3o)] + Op 



/3o 



Replace G by G and use Assumption [TT| part 1, to conclude. 
Step 2 : Study of M,(/3, /, Jq). 

Observe that, on op (1)— neighborhoods of /3o, from a Taylor expansion, 
M„(A/,Jo)-M„(/3o,/,Jo) 

= (/3 - ^o)%M„(/3o, /, Jo) + 0- Po)'y\pMnCPo. /, Jo)(,9 - ^0) + op{\\P - /^olH, 
and apply Theorem 1 and 2 of Sherman (1994) to conclude. □ 



Proof of Proposition 13.61 The uniform convergence results in Assumptions [9] and [TT] 
can be deduced from studying the uniform convergence rate of the numerator and the 
denominator in (13. 4p (and their derivatives) separately. This is a consequence of Lemma 
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lA.ll Since the other terms can be studied in a similar way, we only consider the case of 
the denominator and its derivatives in (13.41) . In each case, the bias part can be dealt with 
uniformly with classical kernel arguments, and is of order . For the centered version of 
/*, the result can be deduced from the study of the uniform convergence rate of empirical 
processes indexed by some class of functions as the one defined in (lA.Gp . with 



C/3(x,X, Z,T,(5) - 



1-G,„(T-|Z)' 

where j = (resp. 1,2) for /* (resp. V/?/*, V^p^/^f*), and ^(T) = T. The kernel in 



( ]A.6|) is either K or K' or K", and ip{(3,x) = (3'x. It follows from the conditions on K 



and from Nolan and Pollard (1987) that the class of functions 

^^^0 f^ P'x-P'u \^ ■.ueX,h>0,(3eB 

is a VC-class of functions. Moreover m — )■ (x — u)^ (j = 0,1,2) is also a VC-class of 
bounded functions using permanence properties of VC-classes, see Lemma 2.6.18 in Van 
der Vaart and Wellner (1996). Finally, since 1 — Ge^{T — \Z) is bounded away from zero, 
flA.7|) holds. Now applying Lemma [A. 11 we get 



sup \ni3'x-/3) - /(/3'x;/3)| = Op{(\ognY/'n-'/'h~'/' + h'), 



sup 



sup \Vl^n(3'x;(3) - Vl^f{(3'x;(3)\ = Op((logn)i/2n-V2/,-5/2 + ^2^^ 

I3,x 

where h"^ comes from the bias term. Hence, Assumption [H] holds if /i — t- and 
nh^ {log n)^^^'^ — 7- oo. Assumption [TT] holds if {log n)~^n^^^an'^h — )■ oo, and nh^ — )■ 0. 

The first part of Assumption [TO] follows directly from the uniform convergence of /*. 
Elementary algebra shows that the gradient of /* can be written as 

\/priP',x;Po) = xm*Mx)+m*Mx). (A.15) 

Using the same arguments as above, these two functions converge uniformly to mi(/3Qa;) 
and m2(/3ox) respectively, where V/3/(/3qX; f3o) = xmi(/3Qx)+m2(/3oa^), see equation ( lA.lip . 
and Assumption [10] follows. □ 

Proof of Proposition 13.71 The result can be deduced from studying the following type 
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of quantities : 



1=1 ^ ^ 



1-G,~(T,- |A(^,X,)) 



l-GeM-\KKXi))_ 
where j = 0, 1, 2 and /c = 0, 1. Using Lemma [A. 2t equation (lA.lOp . and the fact that 



sup 

/3 



i=i ^ 



h 



m\ 



Op(l), 



we can deduce that this type of quantities is of the order Op ((log 71)^/^/1 ^^^fln^^^) 
(j = 0, 1, 2). Hence the convergence rates follow. □ 
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